ehresp = read.csv("ehresp_2014.csv")

The dataset contains 11212 responses and 37 variables. We will examine bmi as an outcome variable, and how factors such as income level, exercise frequency, access to grocery stores, primary eating time, and secondary eating time, contribute to bmi.

Income level.

income_tbl = 
  ehresp %>%
  count(tucaseid) %>%
  mutate(
    erincome = factor(ehresp$erincome, levels = c(-1, 1, 2, 3, 4, 5), labels = c("Blank", "Income > 185% of poverty threshold", "Income <= 185% of poverty threshold", "130% of poverty threshold < Income < 185% of poverty threshold", "Income > 130% of poverty threshold", "Income <= 130% of poverty threshold"))
  ) %>%
  plot_ly(x = ~erincome, y = ~n, color = ~erincome, type = "bar")

income_tbl

The majority of the study objects are on the highest income level - their income is greater than 185% of poverty threshold. The income level with the second largest study population come from people who are below or equal to 130% of poverty threshold. Only 36 study subjects have income greater than 130% of poverty threshold.

BMI, height, and weight.

ehresp %>%
   select(erbmi, euhgt, euwgt) %>%
   filter(erbmi > 0, euhgt > 0, euwgt > 0) %>%
   summary() %>%
   format(scientific = F, digits = 2)
##      erbmi             euhgt             euwgt        
##  "Min.   :13.00  " "Min.   :56.00  " "Min.   : 98.0  "
##  "1st Qu.:23.60  " "1st Qu.:64.00  " "1st Qu.:145.0  "
##  "Median :26.60  " "Median :66.00  " "Median :170.0  "
##  "Mean   :27.77  " "Mean   :66.69  " "Mean   :176.3  "
##  "3rd Qu.:30.70  " "3rd Qu.:70.00  " "3rd Qu.:200.0  "
##  "Max.   :73.60  " "Max.   :77.00  " "Max.   :340.0  "

Among all study objects, the mean height is 66.69 inches, and the mean weight is 176.3 pounds.

The mean BMI is 27.77, and third quantile is 30.7, while the maximum BMI is 73.6. We suspect there might be some BMI outliers on the upper end.

ehresp %>%
  filter(erbmi > 0, euhgt > 0, euwgt > 0) %>%
  ggplot(aes(x = "", y = erbmi)) + 
  geom_boxplot()

Column {data-width=350} The boxplot did show that there are more outliers on the upper end than on the lower end of BMI.